Search CORE

657 research outputs found

Evolutionary Algorithms for Reinforcement Learning

Author: Grefenstette J. J.
Moriarty D. E.
Schultz A. C.
Publication venue: 'AI Access Foundation'
Publication date: 01/06/2011
Field of study

There are two distinct approaches to solving reinforcement learning problems, namely, searching in value function space and searching in policy space. Temporal difference methods and evolutionary algorithms are well-known examples of these approaches. Kaelbling, Littman and Moore recently provided an informative survey of temporal difference methods. This article focuses on the application of evolutionary algorithms to the reinforcement learning problem, emphasizing alternative policy representations, credit assignment methods, and problem-specific genetic operators. Strengths and weaknesses of the evolutionary approach to reinforcement learning are presented, along with a survey of representative applications

arXiv.org e-Print Archive

Crossref

Sheffield University CLEF 2000 submission - bilingual track: German to English

Author: G. Grefenstette
J. Gonzalo
L. Ballesteros
M. F. Porter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2000
Field of study

We investigated dictionary based cross language information retrieval using lexical triangulation. Lexical triangulation combines the results of different transitive translations. Transitive translation uses a pivot language to translate between two languages when no direct translation resource is available. We took German queries and translated then via Spanish, or Dutch into English. We compared the results of retrieval experiments using these queries, with other versions created by combining the transitive translations or created by direct translation. Direct dictionary translation of a query introduces considerable ambiguity that damages retrieval, an average precision 79% below monolingual in this research. Transitive translation introduces more ambiguity, giving results worse than 88% below direct translation. We have shown that lexical triangulation between two transitive translations can eliminate much of the additional ambiguity introduced by transitive translation

CiteSeerX

Crossref

White Rose Research Online

Inheritance-Based Diversity Measures for Explicit Convergence Control in Evolutionary Algorithms

Author: Fortin Félix-Antoine
Grefenstette John J
Mahfoud Samir W
Šenkerik Roman
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/10/2018
Field of study

Diversity is an important factor in evolutionary algorithms to prevent premature convergence towards a single local optimum. In order to maintain diversity throughout the process of evolution, various means exist in literature. We analyze approaches to diversity that (a) have an explicit and quantifiable influence on fitness at the individual level and (b) require no (or very little) additional domain knowledge such as domain-specific distance functions. We also introduce the concept of genealogical diversity in a broader study. We show that employing these approaches can help evolutionary algorithms for global optimization in many cases.Comment: GECCO '18: Genetic and Evolutionary Computation Conference, 2018, Kyoto, Japa

arXiv.org e-Print Archive

Crossref

Genetic algorithms with elitism-based immigrants for changing optimization problems

Author: D. Parrott
F. Vavak
J. Branke
J. Branke
J.J. Grefenstette
S. Yang
S. Yang
S. Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1
Field of study

Copyright @ Springer-Verlag Berlin Heidelberg 2007.Addressing dynamic optimization problems has been a challenging task for the genetic algorithm community. Over the years, several approaches have been developed into genetic algorithms to enhance their performance in dynamic environments. One major approach is to maintain the diversity of the population, e.g., via random immigrants. This paper proposes an elitism-based immigrants scheme for genetic algorithms in dynamic environments. In the scheme, the elite from previous generation is used as the base to create immigrants via mutation to replace the worst individuals in the current population. This way, the introduced immigrants are more adapted to the changing environment. This paper also proposes a hybrid scheme that combines the elitism-based immigrants scheme with traditional random immigrants scheme to deal with significant changes. The experimental results show that the proposed elitism-based and hybrid immigrants schemes efficiently improve the performance of genetic algorithms in dynamic environments

CiteSeerX

Crossref

De Montfort University Open Research Archive

Brunel University Research Archive

Leicester Research Archive

Multiple cyclotron line-forming regions in GX 301-2

Author: Falkner S.
Fuerst F.
Grefenstette B.
Kretschmar P.
Marcu-Cheatham D.
Natalucci L.
Pottschmidt K.
Tomsick J.
Walton D. J.
Publication venue: 'EDP Sciences'
Publication date: 15/09/2018
Field of study

We present two observations of the high-mass X-ray binary GX 301-2 with NuSTAR, taken at different orbital phases and different luminosities. We find that the continuum is well described by typical phenomenological models, like a very strongly absorbed NPEX model. However, for a statistically acceptable description of the hard X-ray spectrum we require two cyclotron resonant scattering features (CRSF), one at ~35 keV and the other at ~50 keV. Even though both features strongly overlap, the good resolution and sensitivity of NuSTAR allows us to disentangle them at >=99.9% significance. This is the first time that two CRSFs are seen in GX 301-2. We find that the CRSFs are very likely independently formed, as their energies are not harmonically related and, if it were a single line, the deviation from a Gaussian shape would be very large. We compare our results to archival Suzaku data and find that our model also provides a good fit to those data. We study the behavior of the continuum as well as the CRSF parameters as function of pulse phase in seven phase bins. We find that the energy of the 35 keV CRSF varies smoothly as function of phase, between 30-38 keV. To explain this variation, we apply a simple model of the accretion column, taking the altitude of the line-forming region, the velocity of the in-falling material, and the resulting relativistic effects into account. We find that in this model the observed energy variation can be explained simply due to a variation of the projected velocity and beaming factor of the line forming region towards us.Comment: 18 pages, 10 figures, accepted for publication in A&

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

Caltech Authors

Evidence for a Variable Ultrafast Outflow in the Newly Discovered Ultraluminous Pulsar NGC 300 ULX-1

Author: Bachetti M.
Brightman M.
Fabian A. C.
Fürst F.
Grefenstette B. W.
Kosec P.
Pinto C.
Walton D. J.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2018
Field of study

Ultraluminous pulsars are a definite proof that persistent super-Eddington accretion occurs in nature. They support the scenario according to which most Ultraluminous X-ray Sources (ULXs) are super-Eddington accretors of stellar mass rather than sub-Eddington intermediate mass black holes. An important prediction of theories of supercritical accretion is the existence of powerful outflows of moderately ionized gas at mildly relativistic speeds. In practice, the spectral resolution of X-ray gratings such as RGS onboard XMM-Newton is required to resolve their observational signatures in ULXs. Using RGS, outflows have been discovered in the spectra of 3 ULXs (none of which are currently known to be pulsars). Most recently, the fourth ultraluminous pulsar was discovered in NGC 300. Here we report detection of an ultrafast outflow (UFO) in the X-ray spectrum of the object, with a significance of more than 3{\sigma}, during one of the two simultaneous observations of the source by XMM-Newton and NuSTAR in December 2016. The outflow has a projected velocity of 65000 km/s (0.22c) and a high ionisation factor with a log value of 3.9. This is the first direct evidence for a UFO in a neutron star ULX and also the first time that this its evidence in a ULX spectrum is seen in both soft and hard X-ray data simultaneously. We find no evidence of the UFO during the other observation of the object, which could be explained by either clumpy nature of the absorber or a slight change in our viewing angle of the accretion flow.Comment: 10 pages, 4 figures. Accepted to MNRA

arXiv.org e-Print Archive

OA@INAF - Istituto Nazionale di Astrofisica

Caltech Authors

Replay-Guided Adversarial Environment Design

Author: Dennis M
Foerster J
Grefenstette E
Jiang M
Parker-Holder J
Rocktäschel T
Publication venue: Neural Information Processing Systems
Publication date: 01/01/2021
Field of study

Deep reinforcement learning (RL) agents may successfully generalize to new settings if trained on an appropriately diverse set of environment and task configurations. Unsupervised Environment Design (UED) is a promising self-supervised RL paradigm, wherein the free parameters of an underspecified environment are automatically adapted during training to the agent's capabilities, leading to the emergence of diverse training environments. Here, we cast Prioritized Level Replay (PLR), an empirically successful but theoretically unmotivated method that selectively samples randomly-generated training levels, as UED. We argue that by curating completely random levels, PLR, too, can generate novel and complex levels for effective training. This insight reveals a natural class of UED methods we call Dual Curriculum Design (DCD). Crucially, DCD includes both PLR and a popular UED algorithm, PAIRED, as special cases and inherits similar theoretical guarantees. This connection allows us to develop novel theory for PLR, providing a version with a robustness guarantee at Nash equilibria. Furthermore, our theory suggests a highly counterintuitive improvement to PLR: by stopping the agent from updating its policy on uncurated levels (training on less data), we can improve the convergence to Nash equilibria. Indeed, our experiments confirm that our new method, PLR ⊥ , obtains better results on a suite of out-of-distribution, zero-shot transfer tasks, in addition to demonstrating that PLR ⊥ improves the performance of PAIRED, from which it inherited its theoretical framework

UCL Discovery

Improving Policy Learning via Language Dynamics Distillation

Author: Grefenstette E
Mu J
Rocktäschel T
Zettlemoyer L
Zhong V
Publication venue: NeurIPS
Publication date: 01/01/2022
Field of study

Recent work has shown that augmenting environments with language descriptions improves policy learning. However, for environments with complex language abstractions, learning how to ground language to observations is difficult due to sparse, delayed rewards. We propose Language Dynamics Distillation (LDD), which pretrains a model to predict environment dynamics given demonstrations with language descriptions, and then fine-tunes these language-aware pretrained representations via reinforcement learning (RL). In this way, the model is trained to both maximize expected reward and retain knowledge about how language relates to environment dynamics. On SILG, a benchmark of five tasks with language descriptions that evaluate distinct generalization challenges on unseen environments (NetHack, ALFWorld, RTFM, Messenger, and Touchdown), LDD outperforms tabula-rasa RL, VAE pretraining, and methods that learn from unlabeled demonstrations in inverse RL and reward shaping with pretrained experts. In our analyses, we show that language descriptions in demonstrations improve sample-efficiency and generalization across environments, and that dynamics modeling with expert demonstrations is more effective than with non-experts

UCL Discovery

Canalization and Symmetry in Boolean Models for Genetic Regulatory Networks

Author: C J Olson Reichhardt
Grefenstette J
Harrison M A
Harrison M A
Kauffman S A
Kevin E Bassler
Pólya G
Waddington C H
Walker C C
Publication venue: 'IOP Publishing'
Publication date: 02/03/2007
Field of study

Canalization of genetic regulatory networks has been argued to be favored by evolutionary processes due to the stability that it can confer to phenotype expression. We explore whether a significant amount of canalization and partial canalization can arise in purely random networks in the absence of evolutionary pressures. We use a mapping of the Boolean functions in the Kauffman N-K model for genetic regulatory networks onto a k-dimensional Ising hypercube to show that the functions can be divided into different classes strictly due to geometrical constraints. The classes can be counted and their properties determined using results from group theory and isomer chemistry. We demonstrate that partially canalized functions completely dominate all possible Boolean functions, particularly for higher k. This indicates that partial canalization is extremely common, even in randomly chosen networks, and has implications for how much information can be obtained in experiments on native state genetic regulatory networks.Comment: 14 pages, 4 figures; version to appear in J. Phys.

arXiv.org e-Print Archive

Crossref

Hierarchical Kickstarting for Skill Transfer in Reinforcement Learning

Author: Grefenstette E
Matthews M
Parker-Holder J
Rocktäschel T
Samvelyan M
Publication venue: Proceedings of Machine Learning Research (PMLR)
Publication date: 01/01/2022
Field of study

Practising and honing skills forms a fundamental component of how humans learn, yet artificial agents are rarely specifically trained to perform them. Instead, they are usually trained end-to-end, with the hope being that useful skills will be implicitly learned in order to maximise discounted return of some extrinsic reward function. In this paper, we investigate how skills can be incorporated into the training of reinforcement learning (RL) agents in complex environments with large state-action spaces and sparse rewards. To this end, we created SkillHack, a benchmark of tasks and associated skills based on the game of NetHack. We evaluate a number of baselines on this benchmark, as well as our own novel skill-based method Hierarchical Kickstarting (HKS), which is shown to outperform all other evaluated methods. Our experiments show that learning with a prior knowledge of useful skills can significantly improve the performance of agents on complex problems. We ultimately argue that utilising predefined skills provides a useful inductive bias for RL problems, especially those with large state-action spaces and sparse rewards

UCL Discovery